A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# this will help in making the Python code more structured automatically (good coding practice)
%load_ext nb_black
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
The nb_black extension is already loaded. To reload it, use: %reload_ext nb_black
hotels = pd.read_csv("StarHotelsGroup.csv")
# copying data to another varaible to avoid any changes to original data
data = hotels.copy()
data.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled |
data.tail()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 56921 | 2 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 4 | 45 | 2019 | 6 | 15 | Online | 0 | 0 | 0 | 163.88 | 1 | Not_Canceled |
| 56922 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 320 | 2019 | 5 | 15 | Offline | 0 | 0 | 0 | 90.00 | 1 | Canceled |
| 56923 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 56924 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 6 | 2019 | 4 | 28 | Online | 0 | 0 | 0 | 162.50 | 2 | Not_Canceled |
| 56925 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
data.shape
(56926, 18)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56926 entries, 0 to 56925 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 56926 non-null int64 1 no_of_children 56926 non-null int64 2 no_of_weekend_nights 56926 non-null int64 3 no_of_week_nights 56926 non-null int64 4 type_of_meal_plan 56926 non-null object 5 required_car_parking_space 56926 non-null int64 6 room_type_reserved 56926 non-null object 7 lead_time 56926 non-null int64 8 arrival_year 56926 non-null int64 9 arrival_month 56926 non-null int64 10 arrival_date 56926 non-null int64 11 market_segment_type 56926 non-null object 12 repeated_guest 56926 non-null int64 13 no_of_previous_cancellations 56926 non-null int64 14 no_of_previous_bookings_not_canceled 56926 non-null int64 15 avg_price_per_room 56926 non-null float64 16 no_of_special_requests 56926 non-null int64 17 booking_status 56926 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 7.8+ MB
data[data.duplicated()].count()
no_of_adults 14350 no_of_children 14350 no_of_weekend_nights 14350 no_of_week_nights 14350 type_of_meal_plan 14350 required_car_parking_space 14350 room_type_reserved 14350 lead_time 14350 arrival_year 14350 arrival_month 14350 arrival_date 14350 market_segment_type 14350 repeated_guest 14350 no_of_previous_cancellations 14350 no_of_previous_bookings_not_canceled 14350 avg_price_per_room 14350 no_of_special_requests 14350 booking_status 14350 dtype: int64
data.drop_duplicates(inplace=True)
data.isnull().sum()
no_of_adults 0 no_of_children 0 no_of_weekend_nights 0 no_of_week_nights 0 type_of_meal_plan 0 required_car_parking_space 0 room_type_reserved 0 lead_time 0 arrival_year 0 arrival_month 0 arrival_date 0 market_segment_type 0 repeated_guest 0 no_of_previous_cancellations 0 no_of_previous_bookings_not_canceled 0 avg_price_per_room 0 no_of_special_requests 0 booking_status 0 dtype: int64
# categorical column should be converted to categorical type
# (It reduces the data space required to store the dataframe,
# every class in the categorical column will be represented by a number under the hood.
# This is useful during model building)
data["type_of_meal_plan"] = data.type_of_meal_plan.astype("category")
data["room_type_reserved"] = data.room_type_reserved.astype("category")
data["market_segment_type"] = data.market_segment_type.astype("category")
data["booking_status"] = data.booking_status.astype("category")
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null category 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null category 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null category 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null category dtypes: category(4), float64(1), int64(13) memory usage: 5.0 MB
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | 0.142146 | 0.459920 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | 0.895270 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| required_car_parking_space | 42576.0 | 0.034362 | 0.182160 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| lead_time | 42576.0 | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| repeated_guest | 42576.0 | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | 112.375800 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
data.describe(include=["category"])
| type_of_meal_plan | room_type_reserved | market_segment_type | booking_status | |
|---|---|---|---|---|
| count | 42576 | 42576 | 42576 | 42576 |
| unique | 4 | 7 | 5 | 2 |
| top | Meal Plan 1 | Room_Type 1 | Online | Not_Canceled |
| freq | 31863 | 29730 | 34169 | 28089 |
cat_columns = [
"type_of_meal_plan",
"room_type_reserved",
"market_segment_type",
"booking_status",
]
for i in cat_columns:
print(data[i].value_counts())
print("*" * 50)
Meal Plan 1 31863 Not Selected 8716 Meal Plan 2 1989 Meal Plan 3 8 Name: type_of_meal_plan, dtype: int64 ************************************************** Room_Type 1 29730 Room_Type 4 9369 Room_Type 6 1540 Room_Type 5 906 Room_Type 2 718 Room_Type 7 307 Room_Type 3 6 Name: room_type_reserved, dtype: int64 ************************************************** Online 34169 Offline 5777 Corporate 1939 Complementary 496 Aviation 195 Name: market_segment_type, dtype: int64 ************************************************** Not_Canceled 28089 Canceled 14487 Name: booking_status, dtype: int64 **************************************************
Let us explore the numerical variables first
# function to plot a boxplot and a histogram along the same scale.
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
Univariate Analysis of lead_time
histogram_boxplot(data, "lead_time")
Observations
Univariate Analysis of no_of_previous_bookings_not_canceled
histogram_boxplot(data, "no_of_previous_bookings_not_canceled")
Observations
Univariate Analysis of avg_price_per_room
histogram_boxplot(data, "avg_price_per_room")
Observations
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 2, 6))
else:
plt.figure(figsize=(n + 2, 6))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
Univariate Analysis of no_of_adults
labeled_barplot(data, "no_of_adults", perc=True)
Observations
Univariate Analysis of no_of_children
labeled_barplot(data, "no_of_children", perc=True)
Observations
Univariate Analysis of no_of_weekend_nights
labeled_barplot(data, "no_of_weekend_nights", perc=True)
Observations
Univariate Analysis of no_of_week_nights
labeled_barplot(data, "no_of_week_nights", perc=True)
Observations
Univariate Analysis of required_car_parking_space
labeled_barplot(data, "required_car_parking_space", perc=True)
Observations
Univariate Analysis of arrival_year
labeled_barplot(data, "arrival_year", perc=True)
Observations
Univariate Analysis of arrival_month
labeled_barplot(data, "arrival_month", perc=True)
Observations
Univariate Analysis of arrival_date
labeled_barplot(data, "arrival_date", perc=True)
Observations
Univariate Analysis of repeated_guest
labeled_barplot(data, "repeated_guest", perc=True)
Observations
Univariate Analysis of no_of_previous_cancellations
labeled_barplot(data, "no_of_previous_cancellations", perc=True)
Observations
Univariate Analysis of no_of_special_requests
labeled_barplot(data, "no_of_special_requests", perc=True)
Observations
Let us now explore the categorical variables
Univariate Analysis of type_of_meal_plan
labeled_barplot(data, "type_of_meal_plan", perc=True)
Observations
Univariate Analysis of room_type_reserved
labeled_barplot(data, "room_type_reserved", perc=True)
Observations
Univariate Analysis of market_segment_type
labeled_barplot(data, "market_segment_type", perc=True)
Observations
Univariate Analysis of booking_status
labeled_barplot(data, "booking_status", perc=True)
Observations
Plot bivariate charts between numeric variables to understand their interaction with each other.
plt.figure(figsize=(15, 7))
sns.heatmap(data.corr(), annot=True, vmin=-1, vmax=1, cmap="Spectral")
plt.show()
Observations
sns.pairplot(data=data, diag_kind="kde")
plt.show()
Observations
plt.figure(figsize=(10, 5))
sns.barplot(data=data, x="market_segment_type", y="avg_price_per_room")
<AxesSubplot:xlabel='market_segment_type', ylabel='avg_price_per_room'>
Observations
# crosstab
ax = pd.crosstab(data["repeated_guest"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("repeated_guest")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| repeated_guest | ||
| 0 | 35.086401 | 64.913599 |
| 1 | 0.760456 | 99.239544 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_special_requests"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_special_requests")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_special_requests | ||
| 0 | 45.516954 | 54.483046 |
| 1 | 27.910860 | 72.089140 |
| 2 | 21.767748 | 78.232252 |
| 3 | 0.000000 | 100.000000 |
| 4 | 0.000000 | 100.000000 |
| 5 | 0.000000 | 100.000000 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_adults"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_adults")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_adults | ||
| 0 | 41.304348 | 58.695652 |
| 1 | 21.875000 | 78.125000 |
| 2 | 35.398629 | 64.601371 |
| 3 | 44.976433 | 55.023567 |
| 4 | 39.285714 | 60.714286 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_children"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_children")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_children | ||
| 0 | 32.845953 | 67.154047 |
| 1 | 39.398672 | 60.601328 |
| 2 | 52.779438 | 47.220562 |
| 3 | 35.897436 | 64.102564 |
| 9 | 50.000000 | 50.000000 |
| 10 | 0.000000 | 100.000000 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_weekend_nights"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_weekend_nights")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_weekend_nights | ||
| 0 | 31.577767 | 68.422233 |
| 1 | 33.686786 | 66.313214 |
| 2 | 36.848252 | 63.151748 |
| 3 | 53.181818 | 46.818182 |
| 4 | 68.518519 | 31.481481 |
| 5 | 70.000000 | 30.000000 |
| 6 | 67.741935 | 32.258065 |
| 7 | 100.000000 | 0.000000 |
| 8 | 100.000000 | 0.000000 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_week_nights"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_week_nights")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_week_nights | ||
| 0 | 24.705041 | 75.294959 |
| 1 | 27.856226 | 72.143774 |
| 2 | 33.823529 | 66.176471 |
| 3 | 36.055901 | 63.944099 |
| 4 | 41.199226 | 58.800774 |
| 5 | 44.071856 | 55.928144 |
| 6 | 53.488372 | 46.511628 |
| 7 | 54.545455 | 45.454545 |
| 8 | 61.157025 | 38.842975 |
| 9 | 60.416667 | 39.583333 |
| 10 | 84.042553 | 15.957447 |
| 11 | 85.000000 | 15.000000 |
| 12 | 68.750000 | 31.250000 |
| 13 | 77.777778 | 22.222222 |
| 14 | 50.000000 | 50.000000 |
| 15 | 57.142857 | 42.857143 |
| 16 | 71.428571 | 28.571429 |
| 17 | 66.666667 | 33.333333 |
Observations
# crosstab
ax = pd.crosstab(data["required_car_parking_space"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("required_car_parking_space")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| required_car_parking_space | ||
| 0 | 34.799212 | 65.200788 |
| 1 | 12.303486 | 87.696514 |
Observations
plt.figure(figsize=(10, 5))
sns.boxplot(y="lead_time", x="booking_status", data=data)
plt.show()
Observations
# crosstab
ax = pd.crosstab(data["arrival_year"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("arrival_year")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| arrival_year | ||
| 2017 | 12.227074 | 87.772926 |
| 2018 | 31.510381 | 68.489619 |
| 2019 | 42.501207 | 57.498793 |
Observations
# crosstab
ax = pd.crosstab(data["arrival_month"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(10, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("arrival_month")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| arrival_month | ||
| 1 | 12.009456 | 87.990544 |
| 2 | 27.552786 | 72.447214 |
| 3 | 29.549951 | 70.450049 |
| 4 | 38.490655 | 61.509345 |
| 5 | 38.500460 | 61.499540 |
| 6 | 38.890253 | 61.109747 |
| 7 | 47.407407 | 52.592593 |
| 8 | 46.592620 | 53.407380 |
| 9 | 29.048086 | 70.951914 |
| 10 | 28.607043 | 71.392957 |
| 11 | 22.627737 | 77.372263 |
| 12 | 14.255765 | 85.744235 |
Observations
# crosstab
ax = pd.crosstab(data["arrival_date"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(20, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("arrival_date")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| arrival_date | ||
| 1 | 34.624146 | 65.375854 |
| 2 | 31.725050 | 68.274950 |
| 3 | 34.985836 | 65.014164 |
| 4 | 33.476703 | 66.523297 |
| 5 | 30.565248 | 69.434752 |
| 6 | 34.287759 | 65.712241 |
| 7 | 34.773371 | 65.226629 |
| 8 | 35.327635 | 64.672365 |
| 9 | 32.666190 | 67.333810 |
| 10 | 33.746356 | 66.253644 |
| 11 | 32.244898 | 67.755102 |
| 12 | 34.219734 | 65.780266 |
| 13 | 32.466620 | 67.533380 |
| 14 | 33.040702 | 66.959298 |
| 15 | 35.940803 | 64.059197 |
| 16 | 35.388928 | 64.611072 |
| 17 | 36.485580 | 63.514420 |
| 18 | 33.187773 | 66.812227 |
| 19 | 29.298487 | 70.701513 |
| 20 | 32.081911 | 67.918089 |
| 21 | 33.738385 | 66.261615 |
| 22 | 36.687797 | 63.312203 |
| 23 | 33.775351 | 66.224649 |
| 24 | 36.980831 | 63.019169 |
| 25 | 37.269939 | 62.730061 |
| 26 | 36.078965 | 63.921035 |
| 27 | 34.583902 | 65.416098 |
| 28 | 34.475375 | 65.524625 |
| 29 | 34.767523 | 65.232477 |
| 30 | 34.730056 | 65.269944 |
| 31 | 30.875000 | 69.125000 |
Observations
# crosstab
ax = pd.crosstab(data["no_of_previous_cancellations"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(20, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("no_of_previous_cancellations")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| no_of_previous_cancellations | ||
| 0 | 34.361056 | 65.638944 |
| 1 | 3.212851 | 96.787149 |
| 2 | 0.000000 | 100.000000 |
| 3 | 2.127660 | 97.872340 |
| 4 | 0.000000 | 100.000000 |
| 5 | 0.000000 | 100.000000 |
| 6 | 0.000000 | 100.000000 |
| 11 | 0.000000 | 100.000000 |
| 13 | 100.000000 | 0.000000 |
Observations
plt.figure(figsize=(40, 10))
sns.countplot(x="no_of_previous_bookings_not_canceled", hue="booking_status", data=data)
plt.show()
Observations
plt.figure(figsize=(10, 5))
sns.boxplot(y="avg_price_per_room", x="booking_status", data=data)
plt.show()
Observations
# crosstab
ax = pd.crosstab(data["type_of_meal_plan"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(20, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("type_of_meal_plan")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| type_of_meal_plan | ||
| Meal Plan 1 | 32.988105 | 67.011895 |
| Meal Plan 2 | 43.086978 | 56.913022 |
| Meal Plan 3 | 12.500000 | 87.500000 |
| Not Selected | 35.773291 | 64.226709 |
Observations
# crosstab
ax = pd.crosstab(data["room_type_reserved"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(20, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("room_type_reserved")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| room_type_reserved | ||
| Room_Type 1 | 31.029263 | 68.970737 |
| Room_Type 2 | 38.161560 | 61.838440 |
| Room_Type 3 | 33.333333 | 66.666667 |
| Room_Type 4 | 39.310492 | 60.689508 |
| Room_Type 5 | 40.507726 | 59.492274 |
| Room_Type 6 | 53.636364 | 46.363636 |
| Room_Type 7 | 35.830619 | 64.169381 |
Observations
# crosstab
ax = pd.crosstab(data["market_segment_type"], data["booking_status"]).apply(
lambda r: r / r.sum() * 100, axis=1
)
ax_1 = ax.plot.bar(figsize=(20, 5), stacked=True, rot=0)
display(ax)
plt.legend(loc="upper center", bbox_to_anchor=(0.1, 1.0), title="booking_status")
plt.xlabel("market_segment_type")
plt.ylabel("Percent Distribution")
for rec in ax_1.patches:
height = rec.get_height()
ax_1.text(
rec.get_x() + rec.get_width() / 2,
rec.get_y() + height / 2,
"{:.0f}%".format(height),
ha="center",
va="bottom",
)
plt.show()
| booking_status | Canceled | Not_Canceled |
|---|---|---|
| market_segment_type | ||
| Aviation | 16.923077 | 83.076923 |
| Complementary | 0.000000 | 100.000000 |
| Corporate | 8.612687 | 91.387313 |
| Offline | 13.917258 | 86.082742 |
| Online | 39.459744 | 60.540256 |
Observations
Lets see if there are any rows with both no_of_weekend_nights and no_of_week_nights as 0 and remove them
len(data[(data["no_of_weekend_nights"] == 0) & (data["no_of_week_nights"] == 0)])
99
# drop these rows
data.drop(data[(data["no_of_weekend_nights"] == 0) & (data["no_of_week_nights"] == 0)].index,
axis=0,
inplace=True,
)
data.shape
(42477, 18)
Lets see if there are any rows with both no_of_adults and no_of_children as 0
len(data[(data["no_of_adults"] == 0) & (data["no_of_children"] == 0)])
0
Lets see if there are any rows with no_of_adults 0 and non 0 no_of_children
len(data[(data["no_of_adults"] == 0) & (data["no_of_children"] != 0)])
184
Since we do not know the age of the children and if these bookings are made under special conditions , we will not remove these rows.
# let's plot the boxplots of these columns to check for outliers
plt.figure(figsize=(17, 1))
sns.boxplot(data=data, x="lead_time")
<AxesSubplot:xlabel='lead_time'>
# let's plot the boxplots of these columns to check for outliers
plt.figure(figsize=(17, 1))
sns.boxplot(data=data, x="avg_price_per_room")
<AxesSubplot:xlabel='avg_price_per_room'>
def treat_outliers(df, col):
"""
treats outliers in a variable
col: str, name of the numerical variable
df: dataframe
col: name of the column
"""
Q1 = df[col].quantile(0.25) # 25th quantile
Q3 = df[col].quantile(0.75) # 75th quantile
IQR = Q3 - Q1
Lower_Whisker = Q1 - 1.5 * IQR
Upper_Whisker = Q3 + 1.5 * IQR
# all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
# all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
df[col] = np.clip(df[col], Lower_Whisker, Upper_Whisker)
return df
def treat_outliers_all(df, col_list):
"""
treat outlier in all numerical variables
col_list: list of numerical variables
df: data frame
"""
for c in col_list:
df = treat_outliers(df, c)
return df
# treating the outliers
numerical_col = ["avg_price_per_room", "lead_time"]
data = treat_outliers_all(data, numerical_col)
# let's plot the boxplots of these columns to check for outliers
plt.figure(figsize=(17, 1))
sns.boxplot(data=data, x="lead_time")
<AxesSubplot:xlabel='lead_time'>
# let's plot the boxplots of these columns to check for outliers
plt.figure(figsize=(17, 1))
sns.boxplot(data=data, x="avg_price_per_room")
<AxesSubplot:xlabel='avg_price_per_room'>
Lets convert the booking_status variable to a numerical variable.
data["booking_status"] = data["booking_status"].apply(
lambda x: 1 if x == "Canceled" else 0
)
Before we proceed to build a model, we'll have to encode categorical features.
# creating dummy varibles
dummy_data = pd.get_dummies(
data,
columns=["type_of_meal_plan", "room_type_reserved", "market_segment_type",],
drop_first=True,
)
dummy_data.head()
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | required_car_parking_space | lead_time | arrival_year | arrival_month | arrival_date | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | 0 | 224 | 2017 | 10 | 2 | 0 | 0 | 0 | 65.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 1 | 2 | 0 | 2 | 3 | 0 | 5 | 2018 | 11 | 6 | 0 | 0 | 0 | 106.68 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 1 | 0 | 2 | 1 | 0 | 1 | 2018 | 2 | 28 | 0 | 0 | 0 | 60.00 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 2 | 0 | 0 | 2 | 0 | 211 | 2018 | 5 | 20 | 0 | 0 | 0 | 100.00 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 3 | 0 | 0 | 3 | 0 | 271 | 2019 | 7 | 13 | 0 | 0 | 0 | 89.10 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
We'll split the data into train and test to be able to evaluate the model that we build on the train data.
X = dummy_data.drop("booking_status", axis=1) # Features
y = dummy_data["booking_status"] # Labels (Target Variable)
# Splitting data into training and test set:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
print(X_train.shape, X_test.shape)
(29733, 27) (12744, 27)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 29733 Number of rows in test data = 12744
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.661588 1 0.338412 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.652935 1 0.347065 Name: booking_status, dtype: float64
Predicting a customer will not cancel the booking but in reality the customer cancels the booking. - Loss of revenue
Predicting a customer will cancel the booking but in reality the customer did not cancel the booking. - spend more money for marketing and reduce prices.
recall should be maximized, the greater the recall higher the chances of minimizing the false negatives.First, let's create functions to calculate different metrics and confusion matrix so that we don't have to use the same code repeatedly for each model.
# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
## Function to calculate recall score
def get_recall_score(model, predictors, target):
"""
model: classifier
predictors: independent variables
target: dependent variable
"""
prediction = model.predict(predictors)
return recall_score(target, prediction)
from statsmodels.stats.outliers_influence import variance_inflation_factor
# we will define a function to check VIF
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))
]
return vif
checking_vif(X_train)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 20.409879 |
| 1 | no_of_children | 2.348806 |
| 2 | no_of_weekend_nights | 2.192104 |
| 3 | no_of_week_nights | 3.779718 |
| 4 | required_car_parking_space | 1.072203 |
| 5 | lead_time | 2.433912 |
| 6 | arrival_year | 261.122669 |
| 7 | arrival_month | 5.574838 |
| 8 | arrival_date | 4.175151 |
| 9 | repeated_guest | 2.108817 |
| 10 | no_of_previous_cancellations | 1.602796 |
| 11 | no_of_previous_bookings_not_canceled | 1.995322 |
| 12 | avg_price_per_room | 21.323715 |
| 13 | no_of_special_requests | 2.046411 |
| 14 | type_of_meal_plan_Meal Plan 2 | 1.140941 |
| 15 | type_of_meal_plan_Meal Plan 3 | 1.024754 |
| 16 | type_of_meal_plan_Not Selected | 1.587943 |
| 17 | room_type_reserved_Room_Type 2 | 1.119320 |
| 18 | room_type_reserved_Room_Type 3 | 1.001401 |
| 19 | room_type_reserved_Room_Type 4 | 1.851392 |
| 20 | room_type_reserved_Room_Type 5 | 1.144932 |
| 21 | room_type_reserved_Room_Type 6 | 2.181760 |
| 22 | room_type_reserved_Room_Type 7 | 1.152450 |
| 23 | market_segment_type_Complementary | 3.757636 |
| 24 | market_segment_type_Corporate | 12.024021 |
| 25 | market_segment_type_Offline | 34.638314 |
| 26 | market_segment_type_Online | 195.388423 |
X_train1 = X_train.drop(["arrival_year",], axis=1,)
checking_vif(X_train1)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 20.311008 |
| 1 | no_of_children | 2.348672 |
| 2 | no_of_weekend_nights | 2.184122 |
| 3 | no_of_week_nights | 3.743997 |
| 4 | required_car_parking_space | 1.072202 |
| 5 | lead_time | 2.432830 |
| 6 | arrival_month | 5.518603 |
| 7 | arrival_date | 4.122374 |
| 8 | repeated_guest | 2.064069 |
| 9 | no_of_previous_cancellations | 1.602746 |
| 10 | no_of_previous_bookings_not_canceled | 1.992959 |
| 11 | avg_price_per_room | 20.549886 |
| 12 | no_of_special_requests | 2.044915 |
| 13 | type_of_meal_plan_Meal Plan 2 | 1.138549 |
| 14 | type_of_meal_plan_Meal Plan 3 | 1.024746 |
| 15 | type_of_meal_plan_Not Selected | 1.582757 |
| 16 | room_type_reserved_Room_Type 2 | 1.118535 |
| 17 | room_type_reserved_Room_Type 3 | 1.001400 |
| 18 | room_type_reserved_Room_Type 4 | 1.849935 |
| 19 | room_type_reserved_Room_Type 5 | 1.142597 |
| 20 | room_type_reserved_Room_Type 6 | 2.173124 |
| 21 | room_type_reserved_Room_Type 7 | 1.149852 |
| 22 | market_segment_type_Complementary | 1.328489 |
| 23 | market_segment_type_Corporate | 2.447333 |
| 24 | market_segment_type_Offline | 5.035240 |
| 25 | market_segment_type_Online | 29.620677 |
X_train2 = X_train1.drop(["avg_price_per_room",], axis=1,)
checking_vif(X_train2)
| feature | VIF | |
|---|---|---|
| 0 | no_of_adults | 19.027817 |
| 1 | no_of_children | 2.293127 |
| 2 | no_of_weekend_nights | 2.175730 |
| 3 | no_of_week_nights | 3.728549 |
| 4 | required_car_parking_space | 1.071022 |
| 5 | lead_time | 2.402572 |
| 6 | arrival_month | 5.392907 |
| 7 | arrival_date | 4.119909 |
| 8 | repeated_guest | 2.063750 |
| 9 | no_of_previous_cancellations | 1.602225 |
| 10 | no_of_previous_bookings_not_canceled | 1.992799 |
| 11 | no_of_special_requests | 2.040749 |
| 12 | type_of_meal_plan_Meal Plan 2 | 1.086024 |
| 13 | type_of_meal_plan_Meal Plan 3 | 1.024696 |
| 14 | type_of_meal_plan_Not Selected | 1.561102 |
| 15 | room_type_reserved_Room_Type 2 | 1.114061 |
| 16 | room_type_reserved_Room_Type 3 | 1.001362 |
| 17 | room_type_reserved_Room_Type 4 | 1.695970 |
| 18 | room_type_reserved_Room_Type 5 | 1.070637 |
| 19 | room_type_reserved_Room_Type 6 | 2.037480 |
| 20 | room_type_reserved_Room_Type 7 | 1.116264 |
| 21 | market_segment_type_Complementary | 1.313848 |
| 22 | market_segment_type_Corporate | 2.238825 |
| 23 | market_segment_type_Offline | 4.515688 |
| 24 | market_segment_type_Online | 22.614131 |
X_train3 = X_train2.drop(["no_of_adults",], axis=1,)
checking_vif(X_train3)
| feature | VIF | |
|---|---|---|
| 0 | no_of_children | 2.262371 |
| 1 | no_of_weekend_nights | 2.173338 |
| 2 | no_of_week_nights | 3.726513 |
| 3 | required_car_parking_space | 1.069917 |
| 4 | lead_time | 2.371605 |
| 5 | arrival_month | 5.392444 |
| 6 | arrival_date | 4.119034 |
| 7 | repeated_guest | 2.060639 |
| 8 | no_of_previous_cancellations | 1.602214 |
| 9 | no_of_previous_bookings_not_canceled | 1.992235 |
| 10 | no_of_special_requests | 2.027042 |
| 11 | type_of_meal_plan_Meal Plan 2 | 1.085932 |
| 12 | type_of_meal_plan_Meal Plan 3 | 1.024691 |
| 13 | type_of_meal_plan_Not Selected | 1.560676 |
| 14 | room_type_reserved_Room_Type 2 | 1.106463 |
| 15 | room_type_reserved_Room_Type 3 | 1.001362 |
| 16 | room_type_reserved_Room_Type 4 | 1.530218 |
| 17 | room_type_reserved_Room_Type 5 | 1.051149 |
| 18 | room_type_reserved_Room_Type 6 | 2.006950 |
| 19 | room_type_reserved_Room_Type 7 | 1.098817 |
| 20 | market_segment_type_Complementary | 1.222798 |
| 21 | market_segment_type_Corporate | 1.952380 |
| 22 | market_segment_type_Offline | 2.608723 |
| 23 | market_segment_type_Online | 11.259872 |
X_train4 = X_train3.drop(["arrival_month",], axis=1,)
checking_vif(X_train4)
| feature | VIF | |
|---|---|---|
| 0 | no_of_children | 2.262258 |
| 1 | no_of_weekend_nights | 2.173328 |
| 2 | no_of_week_nights | 3.726225 |
| 3 | required_car_parking_space | 1.069830 |
| 4 | lead_time | 2.340627 |
| 5 | arrival_date | 4.118532 |
| 6 | repeated_guest | 2.060639 |
| 7 | no_of_previous_cancellations | 1.601669 |
| 8 | no_of_previous_bookings_not_canceled | 1.992201 |
| 9 | no_of_special_requests | 2.013531 |
| 10 | type_of_meal_plan_Meal Plan 2 | 1.085931 |
| 11 | type_of_meal_plan_Meal Plan 3 | 1.024529 |
| 12 | type_of_meal_plan_Not Selected | 1.560381 |
| 13 | room_type_reserved_Room_Type 2 | 1.106374 |
| 14 | room_type_reserved_Room_Type 3 | 1.001234 |
| 15 | room_type_reserved_Room_Type 4 | 1.530201 |
| 16 | room_type_reserved_Room_Type 5 | 1.049571 |
| 17 | room_type_reserved_Room_Type 6 | 2.006181 |
| 18 | room_type_reserved_Room_Type 7 | 1.098807 |
| 19 | market_segment_type_Complementary | 1.173133 |
| 20 | market_segment_type_Corporate | 1.746916 |
| 21 | market_segment_type_Offline | 2.027193 |
| 22 | market_segment_type_Online | 8.523439 |
X_test4 = X_test[X_train4.columns].astype(float)
logit1 = sm.Logit(y_train, X_train4.astype(float))
lg1 = logit1.fit(
disp=False
) # setting disp=False will remove the information on number of iterations
print(lg1.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29733
Model: Logit Df Residuals: 29710
Method: MLE Df Model: 22
Date: Fri, 22 Oct 2021 Pseudo R-squ.: 0.3052
Time: 20:16:58 Log-Likelihood: -13220.
converged: False LL-Null: -19028.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
no_of_children 0.2458 0.047 5.177 0.000 0.153 0.339
no_of_weekend_nights -0.0017 0.018 -0.095 0.924 -0.036 0.033
no_of_week_nights 0.0648 0.010 6.316 0.000 0.045 0.085
required_car_parking_space -1.3609 0.114 -11.990 0.000 -1.583 -1.138
lead_time 0.0161 0.000 64.718 0.000 0.016 0.017
arrival_date -0.0050 0.002 -2.902 0.004 -0.008 -0.002
repeated_guest -3.9270 0.744 -5.280 0.000 -5.385 -2.469
no_of_previous_cancellations 0.3068 0.105 2.913 0.004 0.100 0.513
no_of_previous_bookings_not_canceled -0.0419 0.096 -0.438 0.661 -0.229 0.145
no_of_special_requests -1.2430 0.023 -54.083 0.000 -1.288 -1.198
type_of_meal_plan_Meal Plan 2 0.3608 0.074 4.905 0.000 0.217 0.505
type_of_meal_plan_Meal Plan 3 8.6767 16.261 0.534 0.594 -23.195 40.548
type_of_meal_plan_Not Selected 0.1615 0.041 3.954 0.000 0.081 0.242
room_type_reserved_Room_Type 2 -0.3746 0.123 -3.041 0.002 -0.616 -0.133
room_type_reserved_Room_Type 3 0.6230 1.561 0.399 0.690 -2.436 3.682
room_type_reserved_Room_Type 4 0.2964 0.039 7.590 0.000 0.220 0.373
room_type_reserved_Room_Type 5 0.6065 0.105 5.784 0.000 0.401 0.812
room_type_reserved_Room_Type 6 0.6191 0.111 5.587 0.000 0.402 0.836
room_type_reserved_Room_Type 7 0.7103 0.188 3.782 0.000 0.342 1.078
market_segment_type_Complementary -14.6186 19.321 -0.757 0.449 -52.487 23.250
market_segment_type_Corporate -2.0265 0.114 -17.841 0.000 -2.249 -1.804
market_segment_type_Offline -3.5723 0.071 -50.107 0.000 -3.712 -3.433
market_segment_type_Online -1.0307 0.048 -21.472 0.000 -1.125 -0.937
========================================================================================================
Observations
Negative values of the coefficient shows that probability of cancelling the booking decreases with the increase of corresponding attribute value.
Positive values of the coefficient show that that probability of cancelling the booking increases with the increase of corresponding attribute value.
p-value of a variable indicates if the variable is significant or not. If we consider the significance level to be 0.05 (5%), then any variable with a p-value less than 0.05 would be considered significant.
Lets write a function that will build a loop and remove variables with p-value greater than 0.05 one by one then check the results again and repeat till we get no variables with p-values greater than 0.05.
# initial list of columns
cols = X_train4.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
x_train_aux = X_train4[cols]
# fitting the model
model = sm.OLS(y_train, x_train_aux).fit()
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['no_of_children', 'no_of_week_nights', 'required_car_parking_space', 'lead_time', 'no_of_previous_bookings_not_canceled', 'no_of_special_requests', 'type_of_meal_plan_Meal Plan 2', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 2', 'room_type_reserved_Room_Type 4', 'room_type_reserved_Room_Type 5', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Complementary', 'market_segment_type_Corporate', 'market_segment_type_Offline', 'market_segment_type_Online']
X_train5 = X_train4[selected_features]
X_test5 = X_test4[selected_features]
X_train5 is the final Model with no variables with p-value greater than 0.05. Lets apply this model to statsmodels.
logit5 = sm.Logit(y_train, X_train5.astype(float))
lg5 = logit5.fit(
disp=False
) # setting disp=False will remove the information on number of iterations
print(lg5.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29733
Model: Logit Df Residuals: 29716
Method: MLE Df Model: 16
Date: Fri, 22 Oct 2021 Pseudo R-squ.: 0.3036
Time: 20:28:44 Log-Likelihood: -13251.
converged: False LL-Null: -19028.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
no_of_children 0.2457 0.047 5.178 0.000 0.153 0.339
no_of_week_nights 0.0637 0.010 6.410 0.000 0.044 0.083
required_car_parking_space -1.3611 0.113 -12.007 0.000 -1.583 -1.139
lead_time 0.0161 0.000 64.748 0.000 0.016 0.017
no_of_previous_bookings_not_canceled -1.3393 0.257 -5.214 0.000 -1.843 -0.836
no_of_special_requests -1.2421 0.023 -54.081 0.000 -1.287 -1.197
type_of_meal_plan_Meal Plan 2 0.3612 0.074 4.913 0.000 0.217 0.505
type_of_meal_plan_Not Selected 0.1597 0.041 3.913 0.000 0.080 0.240
room_type_reserved_Room_Type 2 -0.3779 0.123 -3.069 0.002 -0.619 -0.137
room_type_reserved_Room_Type 4 0.2896 0.039 7.434 0.000 0.213 0.366
room_type_reserved_Room_Type 5 0.5973 0.105 5.708 0.000 0.392 0.802
room_type_reserved_Room_Type 6 0.6117 0.111 5.528 0.000 0.395 0.829
room_type_reserved_Room_Type 7 0.7068 0.188 3.769 0.000 0.339 1.074
market_segment_type_Complementary -28.7841 1.03e+05 -0.000 1.000 -2.02e+05 2.02e+05
market_segment_type_Corporate -2.1452 0.110 -19.463 0.000 -2.361 -1.929
market_segment_type_Offline -3.6447 0.066 -55.636 0.000 -3.773 -3.516
market_segment_type_Online -1.1040 0.039 -28.019 0.000 -1.181 -1.027
========================================================================================================
Now no feature has p-value greater than 0.05, so we'll consider the features in X_train5 as the final ones and lg5 as final model.
Coefficient of some variables positive an increase in these will lead to increase in chances cancellation of booking_status.
Coefficient of some variables are negative increase in these will lead to decrease in chances cancellation of booking_status.
# converting coefficients to odds
odds = np.exp(lg5.params)
# finding the percentage change
perc_change_odds = (np.exp(lg5.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train5.columns).T
| no_of_children | no_of_week_nights | required_car_parking_space | lead_time | no_of_previous_bookings_not_canceled | no_of_special_requests | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 1.278538 | 1.065819 | 0.256368 | 1.016187 | 0.262038 | 0.288769 | 1.435031 | 1.173130 | 0.685332 | 1.335915 | 1.817174 | 1.843513 | 2.027485 | 3.156623e-13 | 0.117049 | 0.026130 | 0.331549 |
| Change_odd% | 27.853779 | 6.581891 | -74.363197 | 1.618696 | -73.796163 | -71.123088 | 43.503120 | 17.313049 | -31.466819 | 33.591539 | 81.717432 | 84.351304 | 102.748497 | -1.000000e+02 | -88.295098 | -97.386957 | -66.845080 |
# creating confusion matrix
confusion_matrix_statsmodels(lg5, X_train5, y_train)
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg5, X_train5, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.78169 | 0.58875 | 0.715718 | 0.646055 |
ROC-AUC on training set
logit_roc_auc_train = roc_auc_score(y_train, lg5.predict(X_train5))
fpr, tpr, thresholds = roc_curve(y_train, lg5.predict(X_train5))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Let's see if the recall score can be improved further, by changing the model threshold using AUC-ROC Curve.
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg5.predict(X_train5))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.3345983648005056
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(
lg5, X_train5, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg5, X_train5, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.770255 | 0.781853 | 0.629209 | 0.697275 |
y_scores = lg5.predict(X_train5)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.42
Checking model performance on training set
# creating confusion matrix
confusion_matrix_statsmodels(lg5, X_train5, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg5, X_train5, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.784785 | 0.685649 | 0.68071 | 0.683171 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.33 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.33 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.781690 | 0.770255 | 0.784785 |
| Recall | 0.588750 | 0.781853 | 0.685649 |
| Precision | 0.715718 | 0.629209 | 0.680710 |
| F1 | 0.646055 | 0.697275 | 0.683171 |
Using model with default threshold
# creating confusion matrix
confusion_matrix_statsmodels(lg5, X_test5, y_test)
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg5, X_test5, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.780681 | 0.592358 | 0.72536 | 0.652147 |
ROC curve on test set
logit_roc_auc_train = roc_auc_score(y_test, lg5.predict(X_test5))
fpr, tpr, thresholds = roc_curve(y_test, lg5.predict(X_test5))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
Using model with threshold=0.33
# creating confusion matrix
confusion_matrix_statsmodels(lg5, X_test5, y_test, threshold=optimal_threshold_auc_roc)
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg5, X_test5, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.7664 | 0.781144 | 0.632321 | 0.698898 |
Using model with threshold = 0.42
# creating confusion matrix
confusion_matrix_statsmodels(lg5, X_test5, y_test, threshold=optimal_threshold_curve)
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg5, X_test5, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.780289 | 0.678725 | 0.685232 | 0.681963 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.33 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.33 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.781690 | 0.770255 | 0.784785 |
| Recall | 0.588750 | 0.781853 | 0.685649 |
| Precision | 0.715718 | 0.629209 | 0.680710 |
| F1 | 0.646055 | 0.697275 | 0.683171 |
# testing performance comparison
models_test_comp_df = pd.concat(
[
log_reg_model_test_perf.T,
log_reg_model_test_perf_threshold_auc_roc.T,
log_reg_model_test_perf_threshold_curve.T,
],
axis=1,
)
models_test_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.33 Threshold",
"Logistic Regression-0.42 Threshold",
]
print("Test set performance comparison:")
models_test_comp_df
Test set performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.33 Threshold | Logistic Regression-0.42 Threshold | |
|---|---|---|---|
| Accuracy | 0.780681 | 0.766400 | 0.780289 |
| Recall | 0.592358 | 0.781144 | 0.678725 |
| Precision | 0.725360 | 0.632321 | 0.685232 |
| F1 | 0.652147 | 0.698898 | 0.681963 |
X = dummy_data.drop("booking_status", axis=1) # Features
y = dummy_data["booking_status"] # Labels (Target Variable)
# Splitting data into training and test set:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
print(X_train.shape, X_test.shape)
(29733, 27) (12744, 27)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 29733 Number of rows in test data = 12744
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.661588 1 0.338412 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.652935 1 0.347065 Name: booking_status, dtype: float64
model = DecisionTreeClassifier(criterion="gini", random_state=1)
model.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
Checking model performance on training set
decision_tree_perf_train = model_performance_classification_statsmodels(
model, X_train, y_train
)
decision_tree_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.9962 | 0.98877 | 1.0 | 0.994353 |
confusion_matrix_statsmodels(model, X_train, y_train)
Checking model performance on test set
decision_tree_perf_test = model_performance_classification_statsmodels(
model, X_test, y_test
)
decision_tree_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.793393 | 0.70269 | 0.702214 | 0.702452 |
confusion_matrix_statsmodels(model, X_test, y_test)
There is a huge disparity in performance of model on training set and test set, which suggests that the model is overfiiting.
## creating a list of column names
feature_names = X_train.columns.to_list()
plt.figure(figsize=(20, 30))
out = tree.plot_tree(
model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
It is a very complex tree and difficult to interpret
# Text report showing the rules of a decision tree -
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 86.50 | | | | |--- avg_price_per_room <= 202.50 | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | |--- repeated_guest <= 0.50 | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | |--- avg_price_per_room <= 59.50 | | | | | | | | | |--- weights: [102.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 59.50 | | | | | | | | | |--- avg_price_per_room <= 63.20 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 63.20 | | | | | | | | | | |--- lead_time <= 43.00 | | | | | | | | | | | |--- truncated branch of depth 27 | | | | | | | | | | |--- lead_time > 43.00 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- weights: [738.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | |--- lead_time <= 41.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- lead_time > 41.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | |--- lead_time <= 65.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 65.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- repeated_guest > 0.50 | | | | | | | |--- lead_time <= 33.00 | | | | | | | | |--- weights: [354.00, 0.00] class: 0 | | | | | | | |--- lead_time > 33.00 | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 10.50 | | | | | | | | | |--- weights: [25.00, 0.00] class: 0 | | | | | | | | |--- no_of_previous_bookings_not_canceled > 10.50 | | | | | | | | | |--- arrival_date <= 13.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 13.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | |--- avg_price_per_room <= 92.72 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 92.72 | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | |--- avg_price_per_room > 202.50 | | | | | |--- arrival_date <= 29.00 | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | |--- arrival_date > 29.00 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 86.50 | | | | |--- avg_price_per_room <= 93.33 | | | | | |--- lead_time <= 133.50 | | | | | | |--- lead_time <= 132.50 | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | |--- avg_price_per_room <= 58.75 | | | | | | | | | | |--- lead_time <= 124.50 | | | | | | | | | | | |--- weights: [28.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 124.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- avg_price_per_room > 58.75 | | | | | | | | | | |--- avg_price_per_room <= 61.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- avg_price_per_room > 61.12 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | |--- lead_time <= 110.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 110.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- lead_time > 132.50 | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- lead_time > 133.50 | | | | | | |--- arrival_date <= 6.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- lead_time <= 136.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 136.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- arrival_date > 6.50 | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | |--- avg_price_per_room <= 73.62 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 73.62 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 11.50 | | | | | | | | |--- weights: [61.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 93.33 | | | | | |--- arrival_year <= 2017.50 | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 7.50 | | | | | | | | |--- lead_time <= 101.00 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- lead_time > 101.00 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | |--- avg_price_per_room <= 116.50 | | | | | | | | |--- lead_time <= 106.50 | | | | | | | | | |--- lead_time <= 99.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | |--- lead_time > 99.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 106.50 | | | | | | | | | |--- arrival_date <= 18.50 | | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_date > 18.50 | | | | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- arrival_date > 23.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- avg_price_per_room > 116.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- arrival_year > 2017.50 | | | | | | |--- avg_price_per_room <= 180.00 | | | | | | | |--- avg_price_per_room <= 95.02 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_per_room > 95.02 | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | | |--- lead_time <= 105.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- lead_time > 105.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | | |--- lead_time <= 95.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 95.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | |--- lead_time <= 144.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 144.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | |--- avg_price_per_room <= 115.55 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- avg_price_per_room > 115.55 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- avg_price_per_room > 180.00 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time <= 9.50 | | | | |--- avg_price_per_room <= 200.38 | | | | | |--- lead_time <= 2.50 | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [102.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 77.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- avg_price_per_room > 77.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 78.77 | | | | | | | | | | | |--- weights: [60.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 78.77 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- weights: [72.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | |--- lead_time <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 1.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time > 2.50 | | | | | | |--- avg_price_per_room <= 99.42 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- weights: [75.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 83.01 | | | | | | | | | | |--- lead_time <= 5.50 | | | | | | | | | | | |--- weights: [47.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 5.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- avg_price_per_room > 83.01 | | | | | | | | | | |--- arrival_date <= 21.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 21.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- avg_price_per_room <= 77.70 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 77.70 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | |--- avg_price_per_room > 99.42 | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- arrival_date <= 22.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_date > 22.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 9.50 | | | | | | | | |--- lead_time <= 7.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- arrival_date <= 11.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 11.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 7.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- avg_price_per_room > 200.38 | | | | | |--- arrival_month <= 2.00 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- arrival_month > 2.00 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 59.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time > 9.50 | | | | |--- avg_price_per_room <= 200.35 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_per_room <= 105.28 | | | | | | | |--- lead_time <= 25.50 | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | |--- weights: [79.00, 0.00] class: 0 | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [67.00, 0.00] class: 0 | | | | | | | |--- lead_time > 25.50 | | | | | | | | |--- avg_price_per_room <= 63.11 | | | | | | | | | |--- lead_time <= 69.50 | | | | | | | | | | |--- lead_time <= 27.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 27.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- lead_time > 69.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- avg_price_per_room > 63.11 | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | |--- weights: [0.00, 24.00] class: 1 | | | | | | |--- avg_price_per_room > 105.28 | | | | | | | |--- arrival_year <= 2018.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- lead_time <= 94.50 | | | | | | | | | | |--- lead_time <= 10.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 10.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- lead_time > 94.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | | |--- lead_time <= 35.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 35.00 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- arrival_year > 2018.50 | | | | | | | | |--- lead_time <= 24.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 24.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- arrival_date > 4.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 134.08 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- avg_price_per_room > 134.08 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- no_of_week_nights <= 9.00 | | | | | | | |--- weights: [93.00, 0.00] class: 0 | | | | | | |--- no_of_week_nights > 9.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- avg_price_per_room > 200.35 | | | | | |--- arrival_month <= 11.50 | | | | | | |--- arrival_month <= 1.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- arrival_month > 1.50 | | | | | | | |--- weights: [0.00, 264.00] class: 1 | | | | | |--- arrival_month > 11.50 | | | | | | |--- weights: [5.00, 0.00] class: 0 | |--- no_of_special_requests > 0.50 | | |--- no_of_special_requests <= 1.50 | | | |--- lead_time <= 7.50 | | | | |--- no_of_week_nights <= 9.50 | | | | | |--- avg_price_per_room <= 140.62 | | | | | | |--- no_of_adults <= 0.50 | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_adults > 0.50 | | | | | | | |--- avg_price_per_room <= 74.75 | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | |--- weights: [319.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- arrival_date <= 14.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 14.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 74.75 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- lead_time <= 2.50 | | | | | | | | | | |--- no_of_children <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- no_of_children > 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time > 2.50 | | | | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- arrival_date > 13.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- lead_time <= 1.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 1.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_per_room > 140.62 | | | | | | |--- lead_time <= 4.50 | | | | | | | |--- avg_price_per_room <= 241.00 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 142.10 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 142.10 | | | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 179.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 179.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- avg_price_per_room > 241.00 | | | | | | | | |--- avg_price_per_room <= 243.40 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 243.40 | | | | | | | | | |--- lead_time <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_week_nights > 0.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | |--- lead_time > 4.50 | | | | | | | |--- arrival_date <= 23.50 | | | | | | | | |--- avg_price_per_room <= 143.75 | | | | | | | | | |--- avg_price_per_room <= 141.67 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 141.67 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- avg_price_per_room > 143.75 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- weights: [22.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_date > 23.50 | | | | | | | | |--- weights: [26.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 9.50 | | | | | |--- arrival_month <= 7.50 | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | |--- arrival_month > 7.50 | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | |--- lead_time > 7.50 | | | | |--- avg_price_per_room <= 121.78 | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | |--- lead_time <= 94.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- weights: [487.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- avg_price_per_room <= 99.00 | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 99.00 | | | | | | | | | |--- lead_time <= 63.00 | | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 63.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- lead_time > 94.50 | | | | | | | |--- no_of_weekend_nights <= 3.00 | | | | | | | | |--- avg_price_per_room <= 86.60 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- lead_time <= 147.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time > 147.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 86.60 | | | | | | | | | |--- avg_price_per_room <= 89.55 | | | | | | | | | | |--- arrival_month <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 5.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 89.55 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- no_of_weekend_nights > 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- market_segment_type_Online > 0.50 | | | | | | |--- lead_time <= 90.50 | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | |--- no_of_week_nights <= 9.50 | | | | | | | | | |--- weights: [269.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 9.50 | | | | | | | | | |--- lead_time <= 33.00 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 33.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- arrival_month > 1.50 | | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | | | |--- avg_price_per_room <= 67.36 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- avg_price_per_room > 67.36 | | | | | | | | | | | |--- truncated branch of depth 28 | | | | | | | | | |--- arrival_month > 11.50 | | | | | | | | | | |--- weights: [233.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | | |--- no_of_week_nights <= 8.50 | | | | | | | | | | |--- arrival_date <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_date > 5.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_week_nights > 8.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | |--- lead_time > 90.50 | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | |--- lead_time <= 96.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- arrival_date <= 8.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 8.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | |--- lead_time > 96.50 | | | | | | | | | |--- arrival_date <= 24.50 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- arrival_date > 24.50 | | | | | | | | | | |--- avg_price_per_room <= 51.23 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- avg_price_per_room > 51.23 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | |--- avg_price_per_room <= 72.08 | | | | | | | | | |--- weights: [46.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 72.08 | | | | | | | | | |--- arrival_month <= 3.50 | | | | | | | | | | |--- lead_time <= 127.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time > 127.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- arrival_month > 3.50 | | | | | | | | | | |--- arrival_month <= 9.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- arrival_month > 9.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | |--- avg_price_per_room > 121.78 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- arrival_year <= 2018.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | | |--- no_of_week_nights <= 7.50 | | | | | | | | | | |--- arrival_date <= 19.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- arrival_date > 19.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- no_of_week_nights > 7.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- arrival_month > 8.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- avg_price_per_room <= 162.70 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- avg_price_per_room > 162.70 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- lead_time <= 102.50 | | | | | | | | | |--- weights: [68.00, 0.00] class: 0 | | | | | | | | |--- lead_time > 102.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- arrival_year > 2018.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- avg_price_per_room <= 177.75 | | | | | | | | | | |--- lead_time <= 117.50 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- lead_time > 117.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- avg_price_per_room > 177.75 | | | | | | | | | | |--- lead_time <= 93.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- lead_time > 93.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- lead_time <= 66.50 | | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 66.50 | | | | | | | | | | |--- avg_price_per_room <= 134.17 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- avg_price_per_room > 134.17 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- lead_time <= 134.50 | | | | | | | | | |--- lead_time <= 14.50 | | | | | | | | | | |--- lead_time <= 8.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time > 8.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- lead_time > 14.50 | | | | | | | | | | |--- lead_time <= 33.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- lead_time > 33.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- lead_time > 134.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- arrival_date <= 2.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- arrival_date > 2.00 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- lead_time <= 139.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 139.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [129.00, 0.00] class: 0 | | |--- no_of_special_requests > 1.50 | | | |--- lead_time <= 90.50 | | | | |--- no_of_week_nights <= 3.50 | | | | | |--- weights: [2973.00, 0.00] class: 0 | | | | |--- no_of_week_nights > 3.50 | | | | | |--- no_of_week_nights <= 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- lead_time <= 8.50 | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | |--- weights: [51.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- lead_time <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 6.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | |--- lead_time > 8.50 | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- weights: [30.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- avg_price_per_room <= 91.10 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- avg_price_per_room > 91.10 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- avg_price_per_room <= 77.09 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 77.09 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [97.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 9.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | |--- lead_time > 90.50 | | | | |--- avg_price_per_room <= 200.70 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- arrival_year <= 2018.50 | | | | | | | |--- arrival_month <= 8.50 | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | |--- arrival_date <= 14.00 | | | | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- arrival_month > 7.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 14.00 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 2.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | |--- arrival_date <= 25.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- arrival_date > 25.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- arrival_month > 8.50 | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- lead_time <= 118.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 118.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | |--- arrival_date <= 11.50 | | | | | | | | | | |--- lead_time <= 104.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 104.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- arrival_date > 11.50 | | | | | | | | | | |--- lead_time <= 144.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- lead_time > 144.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- arrival_year > 2018.50 | | | | | | | |--- avg_price_per_room <= 95.25 | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | |--- avg_price_per_room <= 74.82 | | | | | | | | | | |--- lead_time <= 99.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time > 99.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 74.82 | | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- avg_price_per_room > 95.25 | | | | | | | | |--- avg_price_per_room <= 172.12 | | | | | | | | | |--- no_of_week_nights <= 5.50 | | | | | | | | | | |--- arrival_month <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- arrival_month > 4.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- no_of_week_nights > 5.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- avg_price_per_room > 172.12 | | | | | | | | | |--- lead_time <= 145.50 | | | | | | | | | | |--- lead_time <= 141.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time > 141.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 145.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [167.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 200.70 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- weights: [0.00, 23.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [4.00, 0.00] class: 0 |--- lead_time > 150.50 | |--- avg_price_per_room <= 100.04 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time <= 348.50 | | | | |--- avg_price_per_room <= 89.76 | | | | | |--- no_of_special_requests <= 0.50 | | | | | | |--- lead_time <= 258.00 | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | |--- no_of_weekend_nights <= 3.50 | | | | | | | | | |--- arrival_date <= 25.50 | | | | | | | | | | |--- lead_time <= 159.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 159.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- arrival_date > 25.50 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_weekend_nights > 3.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 258.00 | | | | | | | |--- avg_price_per_room <= 88.50 | | | | | | | | |--- arrival_date <= 20.50 | | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | | |--- arrival_year <= 2017.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- arrival_year > 2017.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- arrival_date > 20.50 | | | | | | | | | |--- avg_price_per_room <= 84.38 | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_per_room > 84.38 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 88.50 | | | | | | | | |--- lead_time <= 328.50 | | | | | | | | | |--- arrival_date <= 10.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- arrival_date > 10.00 | | | | | | | | | | |--- lead_time <= 309.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- lead_time > 309.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 328.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- no_of_special_requests > 0.50 | | | | | | |--- lead_time <= 151.50 | | | | | | | |--- arrival_month <= 7.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- arrival_month > 7.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time > 151.50 | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | |--- arrival_date <= 3.50 | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_date > 3.50 | | | | | | | | | |--- weights: [97.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.50 | | | | | | | | |--- lead_time <= 168.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 168.00 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 89.76 | | | | | |--- no_of_special_requests <= 0.50 | | | | | | |--- arrival_month <= 6.50 | | | | | | | |--- arrival_date <= 2.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 2.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- no_of_weekend_nights <= 1.50 | | | | | | | | | | |--- arrival_date <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- arrival_date > 7.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- no_of_weekend_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- lead_time <= 243.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time > 243.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- arrival_month > 6.50 | | | | | | | |--- lead_time <= 233.50 | | | | | | | | |--- lead_time <= 170.50 | | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | | |--- arrival_date <= 3.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- arrival_date > 3.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 6.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_week_nights > 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time > 170.50 | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | |--- lead_time > 233.50 | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | |--- avg_price_per_room <= 98.00 | | | | | | | | | | |--- lead_time <= 310.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time > 310.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- avg_price_per_room > 98.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 95.47 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 95.47 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | |--- no_of_special_requests > 0.50 | | | | | | |--- avg_price_per_room <= 90.25 | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 90.25 | | | | | | | |--- arrival_date <= 30.00 | | | | | | | | |--- arrival_date <= 10.50 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- lead_time <= 189.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time > 189.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- arrival_date > 10.50 | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | |--- arrival_date > 30.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | |--- lead_time > 348.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- arrival_month <= 9.50 | | | | | | |--- avg_price_per_room <= 91.42 | | | | | | | |--- weights: [0.00, 34.00] class: 1 | | | | | | |--- avg_price_per_room > 91.42 | | | | | | | |--- lead_time <= 414.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time > 414.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- arrival_month > 9.50 | | | | | | |--- no_of_weekend_nights <= 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_weekend_nights > 0.50 | | | | | | | |--- arrival_date <= 13.50 | | | | | | | | |--- avg_price_per_room <= 93.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | |--- avg_price_per_room > 93.33 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- arrival_date > 13.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- avg_price_per_room <= 73.44 | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | |--- avg_price_per_room > 73.44 | | | | | | |--- arrival_date <= 9.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- arrival_date > 9.00 | | | | | | | |--- avg_price_per_room <= 92.75 | | | | | | | | |--- no_of_special_requests <= 2.00 | | | | | | | | | |--- no_of_weekend_nights <= 1.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_weekend_nights > 1.00 | | | | | | | | | | |--- avg_price_per_room <= 84.00 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | | |--- avg_price_per_room > 84.00 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | |--- no_of_special_requests > 2.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 92.75 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- no_of_special_requests <= 0.50 | | | | |--- avg_price_per_room <= 2.50 | | | | | |--- lead_time <= 285.50 | | | | | | |--- arrival_date <= 8.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | |--- arrival_date > 8.50 | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | |--- lead_time > 285.50 | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- avg_price_per_room > 2.50 | | | | | |--- no_of_adults <= 2.50 | | | | | | |--- arrival_month <= 11.50 | | | | | | | |--- weights: [0.00, 595.00] class: 1 | | | | | | |--- arrival_month > 11.50 | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | |--- avg_price_per_room <= 72.27 | | | | | | | | | |--- lead_time <= 277.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- lead_time > 277.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- avg_price_per_room > 72.27 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | |--- lead_time <= 223.50 | | | | | | | | | |--- lead_time <= 212.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- lead_time > 212.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time > 223.50 | | | | | | | | | |--- weights: [0.00, 38.00] class: 1 | | | | | |--- no_of_adults > 2.50 | | | | | | |--- avg_price_per_room <= 88.41 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 88.41 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | |--- no_of_special_requests > 0.50 | | | | |--- no_of_weekend_nights <= 0.50 | | | | | |--- lead_time <= 180.50 | | | | | | |--- arrival_month <= 2.50 | | | | | | | |--- avg_price_per_room <= 88.55 | | | | | | | | |--- lead_time <= 172.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- lead_time > 172.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 88.55 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- arrival_month > 2.50 | | | | | | | |--- no_of_special_requests <= 1.50 | | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | |--- arrival_month <= 6.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- arrival_month > 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | |--- arrival_date <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- arrival_date > 30.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- arrival_month > 10.50 | | | | | | | | | |--- no_of_week_nights <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_week_nights > 1.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- no_of_special_requests > 1.50 | | | | | | | | |--- weights: [23.00, 0.00] class: 0 | | | | | |--- lead_time > 180.50 | | | | | | |--- no_of_special_requests <= 2.50 | | | | | | | |--- arrival_month <= 11.50 | | | | | | | | |--- avg_price_per_room <= 36.16 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- avg_price_per_room > 36.16 | | | | | | | | | |--- arrival_month <= 1.50 | | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- arrival_month > 1.50 | | | | | | | | | | |--- weights: [0.00, 182.00] class: 1 | | | | | | | |--- arrival_month > 11.50 | | | | | | | | |--- arrival_date <= 6.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 6.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_children > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 2.50 | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | |--- no_of_weekend_nights > 0.50 | | | | | |--- no_of_special_requests <= 2.50 | | | | | | |--- arrival_year <= 2018.50 | | | | | | | |--- arrival_month <= 10.50 | | | | | | | | |--- avg_price_per_room <= 93.01 | | | | | | | | | |--- no_of_weekend_nights <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 81.12 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- avg_price_per_room > 81.12 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- no_of_weekend_nights > 2.50 | | | | | | | | | | |--- arrival_date <= 6.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- arrival_date > 6.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- avg_price_per_room > 93.01 | | | | | | | | | |--- no_of_week_nights <= 2.50 | | | | | | | | | | |--- avg_price_per_room <= 95.97 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- avg_price_per_room > 95.97 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_week_nights > 2.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- arrival_month > 10.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- lead_time <= 306.50 | | | | | | | | | | |--- no_of_week_nights <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_week_nights > 4.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time > 306.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- arrival_year > 2018.50 | | | | | | | |--- avg_price_per_room <= 76.79 | | | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | | | |--- weights: [18.00, 0.00] class: 0 | | | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | | | |--- lead_time <= 199.00 | | | | | | | | | | |--- no_of_adults <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 1.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- lead_time > 199.00 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- avg_price_per_room > 76.79 | | | | | | | | |--- avg_price_per_room <= 78.06 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | |--- avg_price_per_room > 78.06 | | | | | | | | | |--- lead_time <= 330.50 | | | | | | | | | | |--- arrival_date <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- arrival_date > 8.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- lead_time > 330.50 | | | | | | | | | | |--- arrival_month <= 6.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- arrival_month > 6.00 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | |--- no_of_special_requests > 2.50 | | | | | | |--- weights: [42.00, 0.00] class: 0 | |--- avg_price_per_room > 100.04 | | |--- no_of_special_requests <= 2.50 | | | |--- arrival_month <= 11.50 | | | | |--- arrival_month <= 1.50 | | | | | |--- no_of_week_nights <= 2.50 | | | | | | |--- lead_time <= 213.00 | | | | | | | |--- lead_time <= 204.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- lead_time > 204.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- lead_time > 213.00 | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | |--- no_of_week_nights > 2.50 | | | | | | |--- no_of_special_requests <= 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_special_requests > 0.50 | | | | | | | |--- arrival_date <= 15.00 | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- arrival_date > 15.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- arrival_month > 1.50 | | | | | |--- weights: [0.00, 2630.00] class: 1 | | | |--- arrival_month > 11.50 | | | | |--- no_of_special_requests <= 0.50 | | | | | |--- weights: [38.00, 0.00] class: 0 | | | | |--- no_of_special_requests > 0.50 | | | | | |--- arrival_date <= 24.00 | | | | | | |--- avg_price_per_room <= 156.82 | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | |--- avg_price_per_room > 156.82 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- arrival_date > 24.00 | | | | | | |--- no_of_week_nights <= 3.50 | | | | | | | |--- no_of_children <= 0.50 | | | | | | | | |--- arrival_date <= 29.50 | | | | | | | | | |--- avg_price_per_room <= 119.00 | | | | | | | | | | |--- no_of_adults <= 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_adults > 2.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- avg_price_per_room > 119.00 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- arrival_date > 29.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_children > 0.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- no_of_week_nights > 3.50 | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | |--- no_of_special_requests > 2.50 | | | |--- weights: [149.00, 0.00] class: 0
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.347163 avg_price_per_room 0.155664 no_of_special_requests 0.100093 arrival_date 0.095252 market_segment_type_Online 0.079360 arrival_month 0.064333 no_of_week_nights 0.054437 no_of_weekend_nights 0.028943 arrival_year 0.018988 no_of_adults 0.015609 room_type_reserved_Room_Type 4 0.009020 type_of_meal_plan_Not Selected 0.008478 required_car_parking_space 0.006936 no_of_children 0.006726 type_of_meal_plan_Meal Plan 2 0.003169 room_type_reserved_Room_Type 2 0.001930 room_type_reserved_Room_Type 5 0.001459 room_type_reserved_Room_Type 6 0.000812 market_segment_type_Offline 0.000721 market_segment_type_Corporate 0.000372 repeated_guest 0.000282 no_of_previous_bookings_not_canceled 0.000142 no_of_previous_cancellations 0.000111 type_of_meal_plan_Meal Plan 3 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 7 0.000000 market_segment_type_Complementary 0.000000
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1, class_weight={0: 0.15, 1: 0.85})
# Grid of parameters to choose from
parameters = {
"max_depth": [5, 10, 15, None],
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
"min_impurity_decrease": [0.00001, 0.0001, 0.01],
}
# Type of scoring used to compare parameter combinations
scorer = make_scorer(recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.15, 1: 0.85}, criterion='entropy',
max_depth=5, min_impurity_decrease=1e-05, random_state=1,
splitter='random')
Checking performance on training set
decision_tree_tune_perf_train = model_performance_classification_statsmodels(
estimator, X_train, y_train
)
decision_tree_tune_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.571621 | 0.966706 | 0.439559 | 0.60433 |
confusion_matrix_statsmodels(estimator, X_train, y_train)
Checking model performance on test set
decision_tree_tune_perf_test = model_performance_classification_statsmodels(
estimator, X_test, y_test
)
decision_tree_tune_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.574231 | 0.967217 | 0.447536 | 0.61193 |
confusion_matrix_statsmodels(estimator, X_test, y_test)
plt.figure(figsize=(15, 12))
tree.plot_tree(
estimator,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(estimator, feature_names=feature_names, show_weights=True))
|--- lead_time <= 40.78 | |--- market_segment_type_Online <= 0.01 | | |--- repeated_guest <= 0.07 | | | |--- market_segment_type_Offline <= 0.38 | | | | |--- market_segment_type_Complementary <= 0.60 | | | | | |--- weights: [95.10, 88.40] class: 0 | | | | |--- market_segment_type_Complementary > 0.60 | | | | | |--- weights: [26.25, 0.00] class: 0 | | | |--- market_segment_type_Offline > 0.38 | | | | |--- arrival_month <= 10.06 | | | | | |--- weights: [178.80, 41.65] class: 0 | | | | |--- arrival_month > 10.06 | | | | | |--- weights: [34.95, 0.85] class: 0 | | |--- repeated_guest > 0.07 | | | |--- lead_time <= 31.22 | | | | |--- weights: [116.10, 0.00] class: 0 | | | |--- lead_time > 31.22 | | | | |--- no_of_previous_cancellations <= 2.97 | | | | | |--- weights: [2.10, 0.00] class: 0 | | | | |--- no_of_previous_cancellations > 2.97 | | | | | |--- weights: [0.00, 0.85] class: 1 | |--- market_segment_type_Online > 0.01 | | |--- no_of_special_requests <= 1.84 | | | |--- no_of_special_requests <= 0.59 | | | | |--- lead_time <= 23.15 | | | | | |--- weights: [265.65, 650.25] class: 1 | | | | |--- lead_time > 23.15 | | | | | |--- weights: [58.35, 522.75] class: 1 | | | |--- no_of_special_requests > 0.59 | | | | |--- lead_time <= 4.85 | | | | | |--- weights: [130.95, 29.75] class: 0 | | | | |--- lead_time > 4.85 | | | | | |--- weights: [383.70, 572.05] class: 1 | | |--- no_of_special_requests > 1.84 | | | |--- no_of_week_nights <= 2.26 | | | | |--- weights: [199.05, 0.00] class: 0 | | | |--- no_of_week_nights > 2.26 | | | | |--- no_of_special_requests <= 2.37 | | | | | |--- weights: [78.60, 22.95] class: 0 | | | | |--- no_of_special_requests > 2.37 | | | | | |--- weights: [16.50, 0.00] class: 0 |--- lead_time > 40.78 | |--- market_segment_type_Online <= 0.51 | | |--- lead_time <= 184.18 | | | |--- lead_time <= 101.75 | | | | |--- lead_time <= 85.49 | | | | | |--- weights: [145.20, 61.20] class: 0 | | | | |--- lead_time > 85.49 | | | | | |--- weights: [39.60, 39.10] class: 0 | | | |--- lead_time > 101.75 | | | | |--- avg_price_per_room <= 195.32 | | | | | |--- weights: [109.95, 175.95] class: 1 | | | | |--- avg_price_per_room > 195.32 | | | | | |--- weights: [0.00, 8.50] class: 1 | | |--- lead_time > 184.18 | | | |--- avg_price_per_room <= 109.83 | | | | |--- avg_price_per_room <= 84.90 | | | | | |--- weights: [23.85, 34.00] class: 1 | | | | |--- avg_price_per_room > 84.90 | | | | | |--- weights: [12.60, 90.95] class: 1 | | | |--- avg_price_per_room > 109.83 | | | | |--- no_of_special_requests <= 2.58 | | | | | |--- weights: [1.20, 58.65] class: 1 | | | | |--- no_of_special_requests > 2.58 | | | | | |--- weights: [0.30, 0.00] class: 0 | |--- market_segment_type_Online > 0.51 | | |--- lead_time <= 254.70 | | | |--- lead_time <= 101.53 | | | | |--- required_car_parking_space <= 0.05 | | | | | |--- weights: [616.05, 1995.80] class: 1 | | | | |--- required_car_parking_space > 0.05 | | | | | |--- weights: [25.80, 0.85] class: 0 | | | |--- lead_time > 101.53 | | | | |--- no_of_special_requests <= 0.05 | | | | | |--- weights: [84.75, 1865.75] class: 1 | | | | |--- no_of_special_requests > 0.05 | | | | | |--- weights: [282.00, 1719.55] class: 1 | | |--- lead_time > 254.70 | | | |--- no_of_special_requests <= 0.71 | | | | |--- no_of_adults <= 2.70 | | | | | |--- weights: [1.80, 230.35] class: 1 | | | | |--- no_of_adults > 2.70 | | | | | |--- weights: [0.90, 18.70] class: 1 | | | |--- no_of_special_requests > 0.71 | | | | |--- no_of_special_requests <= 3.11 | | | | | |--- weights: [19.50, 323.85] class: 1 | | | | |--- no_of_special_requests > 3.11 | | | | | |--- weights: [1.05, 0.00] class: 0
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
# Here we will see that importance of features has increased
Imp lead_time 0.489695 no_of_special_requests 0.236461 market_segment_type_Online 0.183413 required_car_parking_space 0.020084 repeated_guest 0.018261 no_of_week_nights 0.014591 avg_price_per_room 0.012618 market_segment_type_Offline 0.011329 market_segment_type_Complementary 0.009471 arrival_month 0.002564 no_of_previous_cancellations 0.001067 no_of_adults 0.000445 no_of_previous_bookings_not_canceled 0.000000 no_of_children 0.000000 type_of_meal_plan_Meal Plan 2 0.000000 arrival_date 0.000000 type_of_meal_plan_Not Selected 0.000000 room_type_reserved_Room_Type 2 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 5 0.000000 room_type_reserved_Room_Type 6 0.000000 room_type_reserved_Room_Type 7 0.000000 arrival_year 0.000000 market_segment_type_Corporate 0.000000 no_of_weekend_nights 0.000000 type_of_meal_plan_Meal Plan 3 0.000000
importances = estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
The DecisionTreeClassifier provides parameters such as
min_samples_leaf and max_depth to prevent a tree from overfiting. Cost
complexity pruning provides another option to control the size of a tree. In
DecisionTreeClassifier, this pruning technique is parameterized by the
cost complexity parameter, ccp_alpha. Greater values of ccp_alpha
increase the number of nodes pruned. Here we only show the effect of
ccp_alpha on regularizing the trees and how to choose a ccp_alpha
based on validation scores.
Minimal cost complexity pruning recursively finds the node with the "weakest
link". The weakest link is characterized by an effective alpha, where the
nodes with the smallest effective alpha are pruned first. To get an idea of
what values of ccp_alpha could be appropriate, scikit-learn provides
DecisionTreeClassifier.cost_complexity_pruning_path that returns the
effective alphas and the corresponding total leaf impurities at each step of
the pruning process. As alpha increases, more of the tree is pruned, which
increases the total impurity of its leaves.
clf = DecisionTreeClassifier(random_state=1)
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.003800 |
| 1 | 0.000000 | 0.003800 |
| 2 | 0.000000 | 0.003800 |
| 3 | 0.000000 | 0.003800 |
| 4 | 0.000000 | 0.003800 |
| ... | ... | ... |
| 2005 | 0.009093 | 0.299653 |
| 2006 | 0.012134 | 0.311787 |
| 2007 | 0.012991 | 0.324779 |
| 2008 | 0.024660 | 0.374098 |
| 2009 | 0.073681 | 0.447779 |
2010 rows × 2 columns
fig, ax = plt.subplots(figsize=(15, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
Next, we train a decision tree using the effective alphas. The last value in ccp_alphas is the alpha value that prunes the whole tree, leaving the tree, clfs[-1], with one node.
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.07368068552560186
For the remainder, we remove the last element in clfs and ccp_alphas, because it is the trivial tree with only one node. Here we show that the number of nodes and tree depth decreases as alpha increases.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
Recall vs alpha for training and testing sets
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=0.01299122169370226, random_state=1)
Checking model performance on training set
decision_tree_postpruned_perf_train = model_performance_classification_statsmodels(
best_model, X_train, y_train
)
decision_tree_postpruned_perf_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.756903 | 0.762175 | 0.613324 | 0.679695 |
confusion_matrix_statsmodels(best_model, X_train, y_train)
Checking model performance on test set
decision_tree_postpruned_perf_test = model_performance_classification_statsmodels(
best_model, X_test, y_test
)
decision_tree_postpruned_perf_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.755257 | 0.7669 | 0.618978 | 0.685045 |
confusion_matrix_statsmodels(best_model, X_train, y_train)
With post-pruning we are getting good and generalized model performance on both training and test set.
plt.figure(figsize=(10, 10))
out = tree.plot_tree(
best_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(best_model, feature_names=feature_names, show_weights=True))
|--- lead_time <= 150.50 | |--- no_of_special_requests <= 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- weights: [3374.00, 351.00] class: 0 | | |--- market_segment_type_Online > 0.50 | | | |--- weights: [3630.00, 3810.00] class: 1 | |--- no_of_special_requests > 0.50 | | |--- weights: [11462.00, 2042.00] class: 0 |--- lead_time > 150.50 | |--- weights: [1205.00, 3859.00] class: 1
# importance of features in the tree building ( The importance of a feature is computed as the
# (normalized) total reduction of the 'criterion' brought by that feature. It is also known as the Gini importance )
print(
pd.DataFrame(
best_model.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp lead_time 0.599031 market_segment_type_Online 0.237032 no_of_special_requests 0.163938 no_of_week_nights 0.000000 required_car_parking_space 0.000000 market_segment_type_Offline 0.000000 market_segment_type_Corporate 0.000000 market_segment_type_Complementary 0.000000 room_type_reserved_Room_Type 7 0.000000 room_type_reserved_Room_Type 6 0.000000 room_type_reserved_Room_Type 5 0.000000 room_type_reserved_Room_Type 4 0.000000 room_type_reserved_Room_Type 3 0.000000 room_type_reserved_Room_Type 2 0.000000 type_of_meal_plan_Not Selected 0.000000 type_of_meal_plan_Meal Plan 3 0.000000 type_of_meal_plan_Meal Plan 2 0.000000 no_of_children 0.000000 avg_price_per_room 0.000000 no_of_previous_bookings_not_canceled 0.000000 no_of_previous_cancellations 0.000000 repeated_guest 0.000000 arrival_date 0.000000 arrival_month 0.000000 arrival_year 0.000000 no_of_weekend_nights 0.000000 no_of_adults 0.000000
importances = best_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Lead_time,market_segment_type_Online,no_of_Special_Requests are important features for post-pruning.
# training performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_train.T,
decision_tree_tune_perf_train.T,
decision_tree_postpruned_perf_train.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.996200 | 0.571621 | 0.756903 |
| Recall | 0.988770 | 0.966706 | 0.762175 |
| Precision | 1.000000 | 0.439559 | 0.613324 |
| F1 | 0.994353 | 0.604330 | 0.679695 |
# test performance comparison
models_train_comp_df = pd.concat(
[
decision_tree_perf_test.T,
decision_tree_tune_perf_test.T,
decision_tree_postpruned_perf_test.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Decision Tree sklearn",
"Decision Tree (Pre-Pruning)",
"Decision Tree (Post-Pruning)",
]
print("Test set performance comparison:")
models_train_comp_df
Test set performance comparison:
| Decision Tree sklearn | Decision Tree (Pre-Pruning) | Decision Tree (Post-Pruning) | |
|---|---|---|---|
| Accuracy | 0.793393 | 0.574231 | 0.755257 |
| Recall | 0.702690 | 0.967217 | 0.766900 |
| Precision | 0.702214 | 0.447536 | 0.618978 |
| F1 | 0.702452 | 0.611930 | 0.685045 |
# import pandas_profiling
from pandas_profiling import ProfileReport
# Use the original dataframe, so that original features are considered
prof = ProfileReport(data)
# to view report created by pandas profile
prof